Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking in BigQuery split and page sources #23741

Merged
merged 5 commits into from
Oct 17, 2024

Conversation

nineinchnick
Copy link
Member

Description

Avoid blocking in BigQuery split and page source constructors, which execute queries, by starting them in a background thread. This should reduce congestion on busy clusters.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Oct 10, 2024
@github-actions github-actions bot added the bigquery BigQuery connector label Oct 10, 2024
@ForBigQuery
public ExecutorService provideExecutor(CatalogName catalogName)
{
return newCachedThreadPool(daemonThreadsNamed("bigquery-" + catalogName + "-%s"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should have an upper bound on the number of threads here, but I wasn't sure if we can come up with a constant value or should it be configurable.

There's also another executor right above, but I didn't want to mix them up. The naming can be improved here for sure, but I also didn't have good ideas.

@hashhar hashhar requested a review from raunaqmorarka October 10, 2024 11:20
@ebyhr
Copy link
Member

ebyhr commented Oct 15, 2024

/test-with-secrets sha=f76016e22128a175fbf1e08a11efdac748f27ab6

Copy link

github-actions bot commented Oct 15, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/11337589498

@nineinchnick nineinchnick force-pushed the bigquery-nonblocking-pagesource branch 3 times, most recently from 2e4ef46 to d3c123a Compare October 15, 2024 14:59
@nineinchnick
Copy link
Member Author

@ebyhr please run the tests again, I ran them locally too.

@ebyhr
Copy link
Member

ebyhr commented Oct 15, 2024

/test-with-secrets sha=d3c123a03207caaa617346bc7f1d795fa49a6da8

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11355050648

Comment on lines +75 to +78
private List<BigQuerySplit> splits;
private int offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could keep the splits as an iterator to avoid tracking offset explicitly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that complicate getting the next batch? Is there an existing example somewhere how to do that elegantly? I think we've extended the scope of this PR enough.

@nineinchnick nineinchnick force-pushed the bigquery-nonblocking-pagesource branch 4 times, most recently from 0894daa to 3d269a4 Compare October 16, 2024 10:35
@raunaqmorarka raunaqmorarka requested a review from ebyhr October 16, 2024 10:38
@nineinchnick
Copy link
Member Author

@raunaqmorarka or @ebyhr, can you run tests with secrets? @ebyhr would you also like to review this PR?

@nineinchnick nineinchnick force-pushed the bigquery-nonblocking-pagesource branch 2 times, most recently from ed1eba5 to 4d50d8f Compare October 16, 2024 11:27
@ebyhr
Copy link
Member

ebyhr commented Oct 16, 2024

/test-with-secrets sha=4d50d8fc942292a36afee39b4916582c2a3ec40f

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11364715101

@ebyhr
Copy link
Member

ebyhr commented Oct 17, 2024

Could you rebase on master to resolve conflicts?

@nineinchnick nineinchnick force-pushed the bigquery-nonblocking-pagesource branch from 4d50d8f to 27efd24 Compare October 17, 2024 08:09
@ebyhr ebyhr merged commit baf1eeb into trinodb:master Oct 17, 2024
19 checks passed
@github-actions github-actions bot added this to the 463 milestone Oct 17, 2024
@nineinchnick nineinchnick deleted the bigquery-nonblocking-pagesource branch October 31, 2024 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants